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Abstract 

We provide a complete thermodynamic solution of a ID hopping model in the pres- 
ence of a random potential by obtaining the density of states. Since the partition 
function is related to the density of states by a Laplace transform, the density 
of states determines completely the thermodynamic behavior of the system. We 
have also shown that the transfer matrix technique, or the so-called dynamic pro- 
gramming, used to obtain the density of states in the ID hopping model may be 
generalized to tackle a long-standing problem in statistical significance assessment 
for one of the most important proteomic tasks - peptide sequencing using tandem 
mass spectrometry data. 
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1 Introduction 

Important in both fundamental science and numerous applications, optimiza- 
tion problems of various degrees of complexity are challenging (see [1] for 
an excellent introduction). Optimization conditioned by constraints that may 
vary from event to event is of especial theoretical and practical importance. As 
a first example, when dealing with a system under a random potential, each 
realization of the random potential demands a separate optimization result- 
ing in a different ground state. The thermodynamic behavior of such a system 
in a quenched random potential crucially depends on the random potential 
realized. A similar but practical problem may arise in routing passengers at 
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various cities to reach their destinations. In the latter case, the optimal rout- 
ing depends on the number of passengers at various locations, the costs from 
one location to the others, which likely to vary from time to time. This type of 
conditional optimization also occurs in modern proteomics problem, that is, 
in the mass spectrometry (MS) based peptide sequencing. In this case, each 
tandem MS (MS 2 ) spectrum constitute a different condition for optimization 
which aims to find a database peptide or a de novo peptide to best explain 
the given MS 2 spectrum. 

When the cost function of an optimization problem can be expressed as a sum 
of independent local contributions, the problem usually can be solved using 
the transfer matrix method that is commonly employed in statistical physics. 
A well-studied example of this sort in statistical physics is the directed poly- 
mer/path in a random medium (DPRM) [2,3,4]. Even when a small non-local 
energetics is involved, the transfer matrix approach still proves useful [5]. As 
an example, the close relationship between the DPRM problem and MS-based 
peptide sequencing, where a small nonlocal energetics is necessary to enhance 
the peptide identifications, was sketched in an earlier publication [5] and the 
cost value distribution from many possible solutions other than the optimal 
one is explored. Indeed, obtaining the cost value distribution from all possible 
solutions in many cases is harder than finding the optimal solution alone. In 
this paper, we will provide the solution to a generic problem that enables a full 
characterization of the peptide sequencing score statistics, instead of just the 
optimal peptide. The ID problem considered is essentially a hopping model in 
the presence of a random potential. The solution to this problem may also be 
useful in other applications such as in routing of passengers and even internet 
traffic. 

In what follows, we will first introduce the generic ID hopping model in a 
random potential, followed by its transfer matrix (or dynamic programming) 
solution. We then discuss the utility of this solution in the context of MS-based 
peptide sequencing, and demonstrate with real example from mass spectrum 
in real MS-based proteomics experiments. In the discussion section, we will 
sketch the utility of the transfer matrix solution in other context and then 
conclude with a few relevant remarks. 



2 ID hopping in random potential 

Along the rr-axis, let us consider a particle that can hop with a set of prescribed 
distances {mj}^ towards the positive x direction. That is, if the particle is 
currently at location xq, it can move to location xq + m\, xq + 777.2, ■ ■ - Xq + itlk 
in the next time step. At each hopping step, the particle will accumulate an 
energy —s(x) from location x that it just visited. The score s(x) (negative 
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of the on-site potential energy) is assumed positive and may only exist at a 
limited number of locations. For locations that s(x) do not exist, we simply 
set s(x) = there. The energy of a path starting from the origin specified 
by the sequential hopping events p = {m hl ,m h2 , . . . , m hL } would have visited 
locations {x±,X2, ■ ■ ■ , xl} with Xi = J2)=i and has energy 



L-l 



E p (x = x L ) = -J2 s ( x i) = -S p (x) 



In general, there can be more than one path terminated at the same point. 
Treating each path as a state with energy given by E p , one ends up having 
the following recursion relation for the partition function Z{x) = J2 P e~^ Ev ^ 

K 

Z(x) = e ?s{x ~ mi) Z{x - mi) , (1) 



where (3 — 1/T plays the role of inverse temperature (with k B = 1 chosen). 
If one were only interested in the best score terminated at point x, it will be 
given by the zero temperature limit (3 oo and the recursion relation may be 
obtained by taking the logarithm on both sides of (1) and divided by (3 then 
taking (3 — > oo limit to reach 



5'be S t(^) = max : {s(x - mi) + S hcst (x — m<)} , 

l<i<K 



(2) 



where S^ es t(x) records the best path score among all paths reaching position 
x. This update method, also termed dynamic programming, records the lowest 
energy and lowest energy path reaching a given point x. The lowest energy 
among all possible at position x is simply — S\ }est (x) and the associated path 
can be obtained by tracing backwards the incoming steps. It is interesting to 
observe that one can also obtain the worst score at each position via dynamical 
programming 

S WO rst(x) = min {s(x - m*) + S WOTSt (x - m^} . (3) 

Kt<K 



The full thermodynamic characterization demands more information than the 
ground state energy. In principle, one may obtain the full partition function 
using eq. (1) evaluated at various temperatures. This procedure, however, 
hinders analytical property such as determination of the average energy 
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A better starting point may be achieved if one can obtain the density of states 
D(E). In this case, we have 



Z = J dEe-P E D(E) 

JdEe-? E ED(E) 
[ ' jdEe-1 E D{E) ' 

Note that if the ground energy E grd of the system is bounded from below, 
the partition function is simply a Laplace transform of a modified density of 
states given by 

oo 

Z = e-^ d J dEe~ l3E D(E) 
o 



where D{E) = D(E — £ grd ) and 

J °° dEe~ f3E ED(E) 



(E) = E grd + 



/ °° dEe~P E D(E) 



This implies that the density of states D(E) together with the ground state 
energy E gr d determine all the thermodynamic behavior of the system. In the 
next section, we will explain how to obtain the density of states using the 
dynamical programming technique as well as how to extend this approach 
to more complicated situations that will be useful in characterizing the score 
statistics in MS-based peptide sequencing. 



3 Obtaining the Density of States 



The density of states is related to the energy histogram in a simple way. The 
number of states between energies E and E + 77 (with 77 1) is given by 
D(E)r). If we happen to use 77 as the energy bin size for energy histogram, 
the count C(E) in the bin with energy E is simply D(E)r) and the density 
of states D(E) = C(E)/rj. For simplicity, we will assume that the all the 
on-site energies —s(x) are integral multiple of rj. This implies that each path 
energy/score is also an integral multiple of 77. In the following subsections, we 
will use score density of states instead of energy density of states. 
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3.1 The Simplest Case and its Application 

We denote by C(x,N) the number of paths reaching position x with score 
Nrj. With this notation, we can easily write down the recursion relation for 
C (x, N) as follows 

C(x, N) = J2 C(x -m h N- < x - mi ) ) . (4 ) 
i=i V 

This recursion relation allows us to compute the density of states in the same 
manner as computing the partition function (1) except that we need to have 
an additional dimension for score at each position x. As an even simpler ap- 
plication of this recursion relation, suppose that one is only interested in the 
number of paths reaching position x, one may sum over the energy part on 
both side of (4) and arrives at 

C(x) = Y / C(x-m l ), (5) 



which enables a very speedy way to compute the total number of paths reach- 
ing position x. In the context of de novo peptide sequencing [6], this number 
corresponds to the total number of all possible de novo peptides within a given 
small mass range. Although simply obtained, this number may be useful for 
providing rough statistical assessment in de novo peptide sequencing. 



3.2 The More Realistic Case 

In general, one may wish to associate with each hop an energy h or one may 
wish to introduce some kind of score normalization based on the number of 
hopping steps. This is indeed the case when applying this framework to MS- 
based peptide sequencing where a peptide length factor adding or multiplying 
to the overall raw score is a common practice. In this case, it becomes impor- 
tant to keep track the number of hops made in each path. We may further 
categorize the counter C(x,N) into ^2 L C(x : N, L). That is, we may separate 
the paths with different number of steps from one another and arrive at a 
finer counter C(x, N, L) which records the number of paths reaching position 
x with score Nrj and with L hopping steps. 

It is rather easy to write down the recursion relation obeyed by this fine 
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counter 



K 



s(x — rrii) 



C(x,N,L) =^C(x-m i ,7V 



7/ 



,L-l). 



(6) 



i=i 



This recursion relation allows us to renormalize the raw score based on the 
number of steps taken. For example, for RAIcLDbS [7], a database search 
method we developed, we divide the raw score obtained by 2(L — 1) for any 
peptide (path) of L amino acids (hopping steps) to get better sensitivity in 
peptide identification. 

In principle, the recursion relations given by (4-6) are all one-dimensional up- 
dates. The only difference is the internal structure of counters at each position 
x. For (5), the counter is just an integer and has no further structure. For (4), 
the counter at each position has a ID structure indexed by the score. For 
(6), the counter at each position x has a 2D structure indexed by both the 
score and the number of hopping steps. This means that in terms of solving 
the problem using dynamical programming, it is always a ID dynamical pro- 
gramming with different degrees of internal structure that may lengthen the 
execution time when shifting from the simplest case (5) to the more compli- 
cated case (6). Obviously at each position x, there is an upper bound and 
a lower bound for score and also for the number of hopping steps accumu- 
lated. We shall call them S\ )es t(x), S WOTSt (x), L mSuX (x) and L min (x) respectively. 
The first two quantities may be obtained by eqs. (2) and (3) respectively. We 
provide the recursions for the two latter quantities below 



Eqs. (2-3) and (7-8) provides the ranges for both the scores and the number 
of cumulative hopping steps at each position x via simple dynamic program- 
ming. As we will discuss later, this information enables a memory-efficient 
computations of score histograms. 



4 Application in MS-based Peptide Sequencing 

In this section, we focus on an important subject in modern biology - using 
MS data to identify the numerous peptides/proteins involved in any given 
biological process. Because of the peptide mass degeneracies and the limited 
measurement accuracy for the peptide mass-to-charge ratio, using MS 2 spectra 




(7) 
(8) 




min {L min (x — m»)} + 1 . 
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is more effective in peptide identifications. In a MS 2 setup, a selected peptide 
with its mass identified by the first spectrometer is fragmented by noble gas, 
and the resulting fragments are analyzed by a second mass spectrometer. 
Although such MS 2 -based proteomics approaches promise high throughput 
analysis, the confidence level assignment for any peptide/protein identified is 
challenging. 

The majority of peptide identification methods are so-called database search 
approaches. The main idea is to theoretically fragment each peptide in a 
database to obtain the corresponding theoretical spectra. One then decides the 
degree of similarity between each theoretical spectrum and the input query 
spectrum using a scoring function. The candidate peptides from the database 
are ranked/chosen according to their similarity scores to the query spectrum. 
Although one may assign relative confidence levels among the candidate pep- 
tides via various (empirical) means, an objective, standardized calibration 
exists only recently [8]. In our earlier publications [5,9], we proposed to tackle 
this difficulty by using a de novo sequencing method to provide an objective 
confidence measure that is both database-independent and takes into account 
spectrum-specific noise. In this paper, we will provide concrete algorithms for 
such purpose. 

To begin, consider a spectrum a with parent ion mass range [w — 5, w + 5] , 
we denote by H(w, 5) the set of all "possible" peptides with masses in this 
range. Given a peptide n from U(w,5), the associated quality score S(ir,a) 
is defined by a prescribed scoring system. The score distribution of S(tt, a) 
within U(w, S) provides naturally a likelihood measure for any given peptide 
7r to the the correct one. 

However, as described earlier [5], this seemingly straightforward idea faces two 
difficulties in terms of implementations. First, unlike the DPRM problem for 
which the function to be optimized is defined without ambiguity, the choice 
of the scoring function is somewhat empirical because the parameters used 
in the scoring must be trained using a training data set. Further, because of 
different instruments and experimental setups, it seems impossible to design a 
scoring system such that the correct peptide for each spectrum has the highest 
score among all possible peptides; the application of a given scoring function 
to general cases may require a leap of faith. Second, even after the scoring 
function is chosen, it is not known how to find the peptide n that maximizes 
S(ir, a) as well as the score distribution pdf(S') within U(w,5) other than by 
the generally impractical procedure of examining all members of H(w, 5). 

The first difficulty can be alleviated by validating high scoring de novo pep- 
tides via database searches [9] and is not the main focus of the current paper. 
Note that a partial solution to the second problem via iterative mapping when 
nonlocal score contributions exist is provided earlier [5]. Here we tackle the 
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second problem head on when the scoring function used does not contain 
nonlocal contribution other than a final renormalization with respect to the 
peptide length. Our algorithms contains two parts: computer memory alloca- 
tions and dynamical programming update. Prior to discussing these two parts, 
however, we first address the important issue of choosing a good mass unit. 



4-1 Choosing a Good Mass Unit 

The goal here is to choose a mass unit A and expresses the molecular mass 
of each amino acid as an integral multiple of this unit. For example, one may 
choose A to be 0.1 Dalton (Da), and round the molecular mass of each amino 
acid to be an integral multiple of 0.1 Da. Once a mass unit is chosen, all the 
masses under consideration are integral multiples of this unit. It turns out 
that different choices of the mass unit leads to different maximum cumula- 
tive mass error. As a specific example, consider using A = 0.1 Da as the mass 
unit. The mass of Alanine, with true mass 71.03711538 Da, is now represented 
as 710A. This molecular mass expression is 0.03711538 Da smaller than the 
true molecular mass of Alanine. When this happens, the integral mass rep- 
resentation has a mass smaller than the true mass, and we call such type 
of mass error a down-error. Now the amino acid Tryptophan with molecular 
mass 186.07931613 Da will be assigned an integral mass of 1861A, which has 
an extra of (0.1 — 0.07931613) Da compared to the true mass. We call this 
type of mass error the up-error. 

The ratio of the mass error to the real molecular mass when multiplied by 
3000 Da provides the cumulative maximum error that can be induced by a 
single amino acid at 3, 000 Da mass. For a fixed mass unit, we went over this 
mass error analysis for each of the twenty amino acids and documented the 
largest up-error and down-error. The larger one between the maximum up- 
error and the maximum down-error is called the max-error. To search for best 
mass units that minimize the max-error at 3, 000 Da, we went over all possible 
mass unit ranging from 0.005 Da to 1.005 Da in step of 10~ 6 Da. Interestingly 
enough, we found a discrete list of mass units that have smaller max-error 
compared to their nearby mass units. These numerically found magic mass 
units are summarized in table 1. 

Once a mass unit is chosen, all the amino acid masses are effectively integers. 
To obtain the score histogram of all de novo peptides when queried by a 
spectrum a with parent molecular mass w (with N- and C- terminal groups 
of the peptide stripped away), we first construct a mass array where index 
k corresponds a molecular mass A; A. To encode all possible peptides with 
molecular mass up to w, we need to have an array of size w/A + 1. Apparently, 
when a larger mass unit is used, the size of the mass array is smaller and thus 
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Table 1 

A list of best mass units in Da. The abbreviation "m.u.e." stands for "maximum 
up-error," while "m.d.e." stands for "maximum down-error." The maximum up- 
error, maximum down-error, and max-error are evaluated in extrapolation to 3, 000 
Da as described in the text. The abbreviation "a.a.w.m.u." stands for "amino acid 
with maximum up-error," while "a.a.w.m.d." stands for "amino acid with maximum 
down-error." 
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reduces computation time. However, as one may see from table 1, the larger 
mass unit is also accompanied by a larger max-error and might not be preferred 
when high mass accuracy is the first priority. 



4-2 Efficient Memory Allocation 

The basic idea of our algorithm is to encode all possible peptides in the mass 
array by linking pointers, analogous to the consecutive hopping steps in the 
ID hopping model. For an amino acid a, let n(a) represents its corresponding 
integer mass in unit of A. For a peptide made of [ai, a 2 , ■ ■ ■ , Q>m], it will have 
a hopping trajectory in the molecular array given by [0, xi,x 2 , ■ ■ ■ with 
x i>i = H l j=\ n { a j)- Let us also denote xm by xp to indicate that it is the 
terminating point of the path. Apparently, all possible peptides with molecular 
masses equal to XpA will all have corresponding hopping paths starting at the 
origin and terminating at xp. Through appropriate pointer linking, one may 
therefore encode all possible peptides with molecular mass XpA in a one- 
dimensional mass array. 
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For a given spectrum a, depending on the score function used, one may cal- 
culate local score contributions at each mass index. This step is done once 
only for the whole mass array, and need not be repeated for each candidate 
peptide. In a typical MS 2 experimental spectrum, there always exists some 
level of parent ion mass uncertainty. Once the size of the mass uncertainty 
is specified, we only need to examine de novo peptides whose corresponding 
hopping paths terminating at a few consecutive mass indices. This indicates 
that some of the mass indices of the aforementioned mass array may not even 
be used in this context. Below we describe how to efficiently obtain relevant 
mass indices and only allocate computer memories for those masses. 

Assume that the possible terminating points are Fi, F2, . . . , Fk with Fj + i = 
Fj + 1. The update rules described in Eqs. (2-3), (5), and (7-8) will also be 
used at this stage. The following pseudocode describes our algorithm. 

Initialize the mass_index = entry 

•Sbest — Sworst — i'max — -^min — > C— 1 ; 

REMARK: Max_aa is the maximum number of amino acids considered 
for (aa_index = 0; aa_index < Max_aa; aa_index ++) { 

label occupancy of n(aa_index) ; 

at n(aa_index) attach a pointer back to 0; 

update Sbcst. ^worst. L max , L min , C at n(aa_index) ; 

} 

for (mass_index = 1; mass_index <= F^; mass_index ++){ 
if (mass_index occupied ?) { 

for (aa_index = 0; aa_index < Max_aa; aa_index ++) { 
label occupancy of (mass_index + n(aa_index) ) ; 
at mass_index+n(aa_index) attach a pointer to mass_index; 
update Sb es t, S worst , L mSLX , L min , C at (mass_index + n(aa_index) ) ; 

} 

} 

} 

for (mass_index = ; mass_index >= F\ ; mass_index — ){ 
backtrack all possible paths — > final occupied entries; 

} 

The last step in the algorithm above identifies relevant mass indices, mass_indices 
that will be traversed by the hopping paths of all peptides with molecular 
masses in the range [FxA, F k A\. We only need to allocate computer memory 
associated with those sites. For each of these relevant sites, we also know the 
values of Sb est , S worst , L max , -^min, and the total number of peptides reaching 
that site through the algorithm above. One may therefore allocate a 2D array 
of size (Sbest (0 — S WOTSt (i)) / 1] x (L mgx (i) —L min (i)) for each relevant mass_index 
i for later use. 
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4-3 Main Algorithm and some Results 

Once memory allocation for relevant mass_indices is done, we can efficiently 
go through those relevant sites to obtain the 2D score histogram that we 
mentioned. In the pseudocode below, update is performed using eq. (6). We 
now demonstrate the very simple main algorithm 

Initialize all the fine counters C(x,N,L)=0 
except C(x = 0, N = 0, L = 0)=1 ; 

for (aa_index = 0; aa_index < Max_aa; aa_index ++) { 
update C(x,N,L) at x =n(aa_index) ; 

} 

for (mass_index in ascendingly ordered relevant massJndices){ 
for (aa_index = 0; aa_index < Max_aa; aa_index ++) { 
update C(x,N,L) at x = (mass_index + n(aa_index) ) ; 

} 

} 

We now define the final 2D counter 

k 

Y{N,L) = ^2 C(Fi,N,L) . (9) 
i=i 

Apparently, in the ID hopping model when allowing k consecutive termi- 
nating points, the resulting density of states D(E) can now be expressed as 
D(E = —Nrj) = J2l ^(-W> L)/r]. If one were interested in normalizing the final 
score in a path-length dependent manner, one will has the following generic 
transformation 

H{E) = W dE' Y{E ' = ~ Nr] > L) 5 (E - f(E>, L)) (10) 

L V 

where f(E',L) is a generic length-normalized energetic function that takes 
the raw energy E' with L hopping steps and turn them into a new energy 
f(E',L), and / dE' — > t]J2n is understood. 

Using a real experimental MS 2 spectrum of parent ion mass 2254.7 ± 3.0 
Da and a raw scoring function (RAId_DbS [7] raw score without divided by 
2(L — 1) with L being the peptide length), we obtained a 2D score histogram. 
From this 2D score histogram, we can compute the average peptide length 
(L) as well. We then transform the 2D score histogram using two different / 
functions. In the first case, f(E',L) = E'/2((L) — 1), meaning that one just 
divides the score by a constant given by 2((L) — 1). In the second case, we use 
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Figure 1. Score histograms for raw score and RAIcLDbS score. Note that the two 
histogram cross each other at large score regime, indicating that the raw score 
function might not be as effective as the RAId_DbS score, see text for details. 

the RAId_DbS scoring function where f(E',L) = E'/2(L — 1). In Figure 1, 
we show the two resulting score histograms along with the fits to theoretical 
distribution function [7]. As one may see from the figure, both histograms are 
well fitted by the theoretical distribution function over at least 15 order of 
magnitudes. There is difference, however, in the histograms obtained. In the 
first case, where the score is merely divided by the average length, we have a 
wider score distribution than that of the second case. This implies that a high 
scoring hit out of the first type of scoring function will have a larger P-value 
than that of the second type. This is perfectly reasonable because when using 
the first type of raw scoring, very long peptides which by random chances are 
more likely to hit on fragment peaks in the mass spectrum are less penalized 
than the shorter peptides. As a consequence, one anticipates more false long 
peptides out of the first type of scoring method than that of the second scoring 
method. Therefore, one should assign a larger P-value to the former case and 
a smaller P-value to the latter case. It is apparently important to be able to 
obtain score histograms of the second scoring method. However, this can only 
be achieved if one keeps the length information in the dynamical programming 
update, see eq. (6). 



5 Discussion, Summary, and Outlook 

Our method may also be extended to other applications. In the case of passen- 
ger routings, the x-axis actually represents time. The local score may be viewed 
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as the additional cost that may vary for different stops. Once the problem is 
laid out, the 2D histogram obtained from our solution indicates the number of 
equivalent routes in terms of additional costs and the total number of stops. 
This problem should be interesting in its own right. 

In this paper, we developed a new approach to obtain the density of states of 
a ID hopping problem in random potential. We have extended the simplest 
case scenario and have shown that we can apply this method to provide a 
complete score histogram for MS-based peptide sequencing problem. This im- 
portant information may be used for a more objective statistical significance 
assignment in peptide identification. Our algorithm may also serve as a speedy 
de novo algorithm. If one is only interested in getting the best scoring pep- 
tide with length normalized score, one only needs to keep track of Sb est (x, L). 
Furthermore, it is straightforward to include in our de novo algorithm post- 
translationally modified amino acids. The effect is simply an enlargement of 
the alphabet. That is, instead of having 20 amino acids, we will simply have 
more allowed masses but without needing to change any part of the algorithm. 

In the near future, we would like to build a web application that allows the 
users to obtain information of interest. For example, a user might be interested 
in knowing: given a parent ion molecular mass and a mass error tolerance, 
how many de novo peptides can there be? Furthermore, we plan to provide 
users with the full score histogram when a query spectrum is provided and a 
scoring method is chosen. Our approach, founded on statistical physics, can 
easily address this type of questions to provide useful information for biological 
researches. 
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